A Variant of Azuma's Inequality for Martingales with Subgaussian 

Tails 



Chad Shamir 
Microsoft Research New England 

ohadsh@microsof t . com 



A sequence of random variables Zi, Z2, . . . is called a martingale difference sequence with respect to another 
sequence of random variables Xi , X2 , ■ • ■, if for any t, Zt+i is a function of Xi , . . . , Xt, and E[Zt+i |Xi , . . . , Xt] ~ 
with probability 1 . 

Azuma's inequality is a useful concentration bound for martingales. Here is one possible formulation of it: 

Theorem 1 (Azuma's Inequality). Let Z\,Z2-, ■ ■ ■ be a martingale difference sequence with respect to Xi, X2, ■ ■ ., 
and suppose there is a constant b such that for any t, 

^T{\Zt\<b) = l. 

Then for any positive integer T and any 5 > Q, it holds with probability at least 1 — 5 that 



^ '21og(l/5) 



t=\ 

Sometimes, for the martingale we have at hand, Zt is not bounded, but rather bounded with high probability. In 
particular, suppose we can show that the probability of Zt being larger than a (and smaller than —a), conditioned on 
any X\^ . . . , Xt^\, is on the order of exp(— f2(a^)). Random variables with this behavior are referred to as having 
subgaussian tails (since their tails decay at least as fast as a Gaussian random variable). 

Intuitively, a variant of Azuma's inequahty for these 'almost-bounded' martingales should still hold, and is prob- 
ably known. However, we weren't able to find a convenient reference for it, and the goal of this technical report is to 
formally provide such a result: 

Theorem 2 (Azuma's Inequality for Martingales with Subgaussian Tails). Let Zi, Z2, . . . , Zt be a martingale differ- 
ence sequence with respect to a sequence Xi, X2, • . . , Xt, and suppose there are constants 6 > 1, c > such that 
for any t and any a > 0, it holds that 

max{Pr(Zt > a\Xi, . . . , Xt-i) ,Pr {Zt < -a\Xi, . . . , Xt-i)} < bcxp{-ca^). 

Then for any S > 0, it holds with probability at least 1 — 5 that^ 



T 



iVz ^ , / 2861og(l/<5) 

T ^ * - V cT 

t=i 

Proof of Thm.|2] 

We begin by proving the following lemma, which bounds the moment generating function of subgaussian random 
variables. 



' It is quite likely that the numerical constant in the bound can be improved. 
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Lemma 1. Let X be a random variable with ¥\X] = 0, and suppose there exist a constant h > 1 and a constant c 
such that for all t > 0, it holds that 



Then for any s > 0, 



max{Pr(X > t),Pr(X < ~t)} < bcxp{~ca'^). 



Proof. We begin by noting that 

poo poo poo poo rjT 

E[X^] = / Pt{X^ > t)dt < / Ft{X > Vt)dt + / Pr{X < ~Vi)dt < 2b exp{~ct)dt = — 
Jt=o Jt=o Jt=o Jt=o ^ 

Using this, the fact that E[X] = 0, and the fact that < 1 + a + for all a < 1, we have that 



sX 



X <- 

s 



Pr (x<-^+^E [e"^ \j < sX < j + l] Pr {j < sX < j + 1) 



<E[l + sX + s^X^sX < l] Pr {sX < 1) + ^ e^+^ Pr ( X > ^ 



(1) 



We now need to bound the series X^jli ^''^^ '^"'^^ ''■ If s < \/c/2, we have 

for all j. Therefore, the series can be upper bounded by the convergent geometric series 



E 



l_e-c/(2s") 



where we used the upper bound e '^/^^^ ^ < e ^ < 1/2 in the second transition, and the last transition is by the 
inequality < - for all a; > 0. Overall, we get that if s < \/c/2, then 



< 1 



2bs^ . 4s 



(2) 



We will now deal with the case s > \/c/2. For all j > Zs^ /c, we have 2 — jc/ < —1, so the tail of the series 
satisfies 

°° Q 2 

^ e^(2-.c/.^) <ge-. <2< — . 

j>3s^/c j=0 

Moreover, the function j M> j(2 — jc/ s^) is maximized at j = /c, and therefore e-''^^^-''^/*'^ < e^^/"^ for all j. 
Therefore, the initial part of the series is at most 



gj(2-ic/s=) < ^gsVc < gs'/eCgsVc < g(l + l/e)sVc 

- g - 



where the second to last transition is from the fact that a < e"/*^ for all a. 



Overall, we get that if s > \/c/2, then 



c 



(3) 



where the last transition follows from the easily verified fact that 1 + lO&a + e(i+i/«=)^'^ < e^^° for any a > 1/4, and 
indeed 6s^/c > 1/4 by the assumption on s and the assumption that 5 > 1. Combining Eq. (|2]) and Eq. Q to handle 
the different cases of s, the result follows. □ 

After proving the lemma, we turn to the proof of Thm.|2] 

Proof of Thm.^ We proceed by the standard Chernoff method. Using Markov's inequality and Lemma [1] we have 
for any s > that 
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E 



> e 



Pr ( e^^=i ^* > e 



sTe 



,sZt 



Xi , . . . , Xj 



E 



T-l 
sZt J~J „sZt 



e " lie 



Xi, . . . , 



-sTe 



E 



E [e'^^\Xi,...,XT-i] E 



T-l 



Xi, . . . , 



< e-^'^'e^^^'/'^E 



T-l 

n 



Xi, . . . , Xt-1 



^ g-sTe+7T6s^/c 



Choosing s = ce/146, the expression above equals e "^^^ and we get that 
setting the r.h.s. to 6 and solving for e, the theorem follows. 



\ t=i 



□ 
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